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Abstract 

Monads have taken the world by storm, and are supported by do- 
notation (at least in Haskell). Programmers are increasingly waking 
up to the usefulness and ubiquity of Applicatives, but they have 
so far been hampered by the absence of supporting notation. In 
this paper we show how to re-use the very same do -notation to 
work for Applicatives as well, providing efficiency benefits for 
some types that are both Monad and Applicative, and syntactic 
convenience for those that are merely Applicative. The result is 
fully implemented in GHC, and is in use at Facebook to make it 
easy to write highly-parallel queries in a distributed system. 

1. Introduction 

Consider this Haskell function that calculates the number of com- 
mon friends between two Facebook users: 

numCommonFriends : : Id -* Id -* Haxl Int 
numCommonFriends x y = do 
fx «- friendsOf x 
fy *- friendsOf y 

return (length (intersect fx fy)) 

Here friendsOf is an operation that makes a remote query to 
a database to fetch the list of friends of a user. Desugaring the 
monadic do expression according to the Haskell standard m 
yields this: 

numCommonFriends x y = 
friendsOf x »= Afx -> 
friendsOf y »= Afy -* 
return (length (intersect fx fy)) 

where >>= and return are operations from the Monad class. This 
translation works fine, but it is inherently sequential : the second 
call to friendsOf cannot start until the first returns, because the 
result of the first call, namely fx, is in scope at the second call so 
in principle might be used by it. But, tantalisingly, fx manifestly 
isn ’t used by the second call, so we actually could run the two in 
parallel. 

Marlow et. al. DU showed how to exploit this parallelism by 
using McBride and Paterson’s insight that between a Functor and 
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a Monad lies an Applicative D3. To be concrete, we can rewrite 
numCommonFriends using Applicative combinators like this: 

numCommonFriends : : Id -* Id -* Haxl Int 
numCommonFriends x y = 

(Afx fy -* length (intersect fx fy)) 

<$> friendsOf x 
<*> friendsOf y 

The combinators <$> and <*> are defined in Figure[I] but for now 
we simply note that the two calls to friendsOf are now manifestly 
independent of one another. And indeed the implementation of the 
Haxl monac0«w take advantage of that independence to perform 
the two friendsOf queries in parallel; in fact it collects them 
together and batches them into a single query. 

But there is still a problem; programmers should not have to spot 
where they can use <*> to gain its advantages, because they are 
likely to miss some opportunities, especially when code is refac- 
tored. Moreover there are maintainability and comprehensibility 
benefits in using a single universal notation, namely do notation. 
In this paper we show how to have our cake and eat it too: the 
programmer writes do notation, and the compiler desugars it au- 
tomatically into the efficient parallel code that uses Applicative 
combinators. We make these contributions: 

• Rather than desugaring do notation uniformly into Monad com- 
binators, we show how to take advantage of the program’s de- 
pendency structure to selectively use Applicative combina- 
tors instead (Section [2. 1 1 >. For some types that are both Monad 
and Applicative, this provides efficiency benefits at runtime 
without losing any maintainability or clarity in the source code. 
For types that are Applicative but not Monad, we gain access 
to the do notation, providing a syntactic convenience. 

• The more we can use Applicative combinators, the better. 
But as we show in Section [T4| there may be more than one way 
to desugar a do-expression into Applicative combinators, 
none of which is universally best. We propose a definition of 
optimality by fixing a set of assumptions. 

• We present a detailed translation of Haskell's do-notation into 
Applicative operations (Section[3]l using our definition of op- 
timality. This translation proceeds by way of an independently- 
interesting elaboration of the do-notation. 

• We present an implementation of the described translation in 
the Glasgow Haskell Compiler (Section|5j, and measure its ef- 
fectiveness on existing widely-used open-source Haskell code, 
and a large codebase at scale. 

• The Haxl monad is not the only abstraction where using 
Applicative combinators leads to more efficient code than 
the equivalent expression written using Monad combinators. 
We give some more examples in Section[6] 

1 https : //github . com/f acebook/haxl 
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class Functor f where 

fmap :: (a-»b)-*fa->fb 

class Functor f => Applicative f where 
pure : : a -* f a 
(<*>) ::f(a-»b)-»fa-*fb 


class Applicative f 
return : : a -* f a 
(»=) : : f a -* (a 


=> Monad f where 


f b) -• f b 


<$> : : Functor f => (a -+ b) -* f a -* f b 

<$> = fmap 


ap : : (Monad m) => m (a -> b) -* m a -* m b 

ap mf mx = mf >>= Af -> mx »= Ax -* return (f x) 


join : : (Monad m) => m (m a) -» m a 

join x = x »= id 


Laws used in this paper 


difference currently requires the programmer to spot where they 
can use <*> and refactor their code to use it, but, like other compiler 
optimisations, we would prefer the compiler to automatically take 
advantage of <*> whenever it can. The approach we take is to have 
the compiler desugar do notation into uses of the Applicative 
operations where possible, falling back to Monad when necessary. 

This second observation is the strongest motivator for this work: 
the Haxl monad (called Fetch in previous work fTTlB provides par- 
allelism between data-fetching operations when the <*> operator is 
used. But programmers should not have to think about where to use 
<*>. Indeed, we would prefer not to use <*> explicitly in our code 
at all, because it is sensitive to refactoring: introducing or removing 
dependencies between expressions affects where we can use <*>, 
and if the programmer is responsible for using <*> then not only 
do they have to spend time thinking about it, but they are likely to 
do an imperfect job. Thus we would like programmers to be able to 
use a simple universal syntax, so that they can focus on correctness 
while letting the compiler exploit parallelism as far as possible. The 
translation we present in this paper achieves this: Haxl program- 
mers use do notation, and the compiler automatically extracts the 
available parallelism. This translation is used in a system at Face- 
book, and results in significant performance gains (Section[5.5|i. 


f <$> m 
<*> 
pure 
pure r »= f 


pure f <*> m 
ap 

return 
f r 


Figure 1. Definitions of Functor, Applicative, Monad, <$>, ap, 
and join 


2. The main idea 

In 1992 Wadler suggested using monads as a programming abstrac- 
tion mu, conveniently embodied as a type class Monad in Haskell. 
Monads took the world by storm, and have appeared in many other 
languages. 

Sixteen years later, McBride and Paterson discovered another 
key abstraction, which they called applicative functors fl3l . em- 
bodied by the Applicative type class. The Applicative class 
sits between Functor and Monad in the class hierarchy; every 
Monad is an Applicative and every Applicative is a Functor, 
but the reverse of these is not necessarily true. Figure |T| gives the 
definitions of these classes for easy reference. 

This paper is based on two observations. Firstly, it would be 
convenient to be able to write Applicative expressions using do 
notation. For example, given an effectful map written using do 
notation like this: 

mapM [] = pure [] 

mapM (x:xs) = do x’ «- f x 

xs ’ i- mapM f xs 
pure (x’ : xs’) 


2.1 The challenge 

The challenge is this: given an arbitrary expression in do-notation, 
we would like to translate it into an expression that, wherever 
possible, uses operations from the Applicative class rather than 
the Monad class. For reference, the definition]]] of the Functor, 
Monad, and Applicative type classes as provided in GHC 8.0. In 
are given in Figure [T] along with the auxiliary functions <$> (an 
infix spelling of fmap), and ap. 

Figure |T] also gives the laws that are expected to hold for in- 
tances of Monad and Applicative. For example, in many mon- 
ads <*> is defined to be ap; but even where it has a more efficient 
implementation the second law says that its semantics should be 
the same as ap. Nothing enforces these laws, but our alternative 
desugaring is only semantics-preserving if these laws hold for the 
relevant instances of Functor, Applicative, and Monad. 

Before we give the translation scheme in full in Section [3] we 
will motivate our design through a series of examples. First, a 
straightforward example involving two independent statements: 

do xl «- A 
x2 «- B 

return (xl,x2) 

where A and B are arbitrary expressions, and B does not mention xl. 
The normal desugaring of this expression, according to the Haskell 
2010 Report, would yield this expression: 

A »= Axl -> 

B »= Ax2 -> 
return (xl,x2) 

Using <$> and <*> instead gives us: 


we would like GHC to infer this type and desugaring for it: 


(,) <$> A <*> B 


mapM : : Applicative m => (a -* m b) -* [a] -* m [b] 

mapM [] = pure [] 

mapM (x:xs) = (:) <$> f x <*> mapM f xs 

Notice that, despite the use of do-notation, the inferred type in- 

dicates that mapM works for any Applicative, not just for any 
Monad, and so will work for a significantly wider range of types. 

The second, and more important, observation is that in some 
Monads the Applicative <*> operation is more efficent than 
the equivalent Monad ap operation. Exploiting this performance 


This is semantically equivalent, as you can check for yourself using 
the laws given in Figure[T| plus the definition of ap. 

Next, let us modify the original expression so that the expression 
B mentions the variable xl: 


2 For simplicity we have omitted default definitions, and the operators $>, 
>>, *>, and <*. 

3 After extensive user debate, GHC has diverged from the Haskell 2010 
specification by adding the new class Applicative as a superclass of 
Monad. 
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do xl «- A 

x2 *- B[xl] — An expression B mentioning xl 
return (xl,x2) 

There is no way to desugar this expression into a use of the <*> 
operator as before, because there is now a dependency between B 
and A. We can see that from the types of <*> and »=: 

(<*>) : : Applicative f => f (a -* b) -> f a -*fb 

(>>=) : : Monad f =>fa -» (a -* f b) -» f b 

The type of >>= allows the second computation (f b) to depend on 
the result a of the first, whereas <*> does not. This is the essence 
of the difference between Monad and Applicative; Monad allows 
dependencies on previous results, whereas Applicative does not. 
So we must desugar the example to 

A »= Axl -> B[xl] »= Ax2 -* return (xl,x2) 

In short: whenever there is a dependency between two statements in 
a do-notation expression, our translation must use >>= somewhere. 

2.2 Mixing it up 

However, it’s not an either/or choice: we may be able to desugar in 
a way that uses <*> in some places and >>= in others. For example: 

do xl «- A 
x2 «- B 
x3 *- C [xl] 
x4 «- D[x2] 
return (x3,x4) 

Here we have two pairs of statements, A and B, and C and D. The 
statements in each pair are independent, but C and D depend on the 
results of A and B respectively. So we can do A and B applicatively in 
parallel, gather the results with »=, and then do C and D in parallel. 
Here's a picture to show what we mean: 

A C 

X Z' X 

( , ) (,)->■ result 



B D 


and we use the informal notation (A | B); (C | D) to describe this 
structure. But it’s not really informal: we can express it directly 
using our combinators as follows 

((,) <$> A <*> B) 

»= 

A(xl,x2) -* (,) <$> C[xl] <*> D[x2] 

The first line does A and B in parallel, building a result pair 
(xl,x2); then comes a monadic bind; then we match the pair 
and do C and D in parallel. The important point is that we use the 
applicative <*> where possible, and the monadic »= where neces- 
sary. 

2.3 Accounting for effects 

Looking again at the example in the previous section, there is an 
alternative execution plan that would respect the data dependencies: 

A ->■ C 



B -a- D 

or, in our informal notation (A; C) | (B; D). After all, the data depen- 
dencies only require that C occurs after A, and D after B. Moreover, 
this appears to be a better plan than the one in Section[T2| because 


it removes an apparently-unnecessary synchronisation point. To see 
why it is better, suppose A and D take two seconds each and B and C 
both take one second. Then the above plan takes three seconds, but 
the one in Section lT2l takes four. 

Alas we cannot use this more efficient execution plan, though, 
because it amounts to swapping the order of B and C. The corre- 
sponding applicative expression is this: 

(,) <$> (A »= Axl -» C [xl] ) 

<*> (B »= Ax2 -» D [x2] ) 

but this is not semantically equivalent to the original do -notation 
expression. Imagine executing it under a State monad, for exam- 
ple: the effects would appear in the order A, C, B, D, and the program 
may give different results. 

Reordering the statements is only valid in a commutative 
monad, where the order of effects is not observable. The Haxl 
monad is not commutative, because it supports effects in the form 
of exceptions, so reordering statements can change which excep- 
tions are reported. In our design, we therefore never reorder com- 
putations. We leave for future work the possiblity of allowing re- 
ordering for commutative monads. 

Even though our transformation does no automatic re-ordering, 
the programmer is free to do so manually, by writing: 

do xl «- A 

x3 «- C[xl] 
x2 <- B 
x4 <- D[x2] 
return (x3,x4) 

Now our transformation will be able to produce the more efficient 
result. 

2.4 There is no single best translation 

Consider this example: 

do x <- A 
y «- B 
z <- C[x] 
return (z+y) 

There are two ways that we might consider implementing this, in 
our informal notation: 

1. (A i B); C 

2. A; (B | C) 

Which one is better? Alas, it depends on the relative execution 
times of A, B, and C. Imagine a parallel execution model where 
we can determine the overall execution time (which we will call 
“cost”) by interpreting “I” as maximum and as addition, so 
the cost of (1) is max( A, B) +C, and the cost of (2) is A + max(B, C). 
Now let’s assign some example costs to A, B, C: 

• A = 1, B = 1, C = 1: both alternatives have equal cost, 2. 

• A = 0, B = 2, C = 1: (1) has cost 3, (2) has cost 2. 

• A = 1, B = 2, C = 0: (1) has cost 2, (2) has cost 3. 

It is easy to see that which translation is better depends on the 

relative cost of the terms. 

We cannot have complete knowledge of the costs of the state- 
ments in a do: consider the case where the statements are lambda- 
bound variables, for example. Therefore it is not possible to find 
an optimal translation in general. Our scheme uses a conservative 
definition of “optimal” wherein each statement is assumed to have 
equal cost. 
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V 

Expressions 

e £ Expr ::= v 

ei e 2 
Ap — > e 
(ei, , e n ) 

do l e 


Variables 


n >= 2 


Patterns 

p £ Pat ::= v 

I (Pi, ■■•,£«) n>=2 


Statement sequences 

l £ Stmts ::= {si;...;s n } n >= 1 


Statements 

s £ Stmt 


(h | ... | /n) 

p <- e 

e 

let bind in e 


n >= 2 

Monadic bind 
Guard 


Figure 2. Syntax 


desugar std (do {e}) = e 

desugar std (do {p *- e; s}) = e »= A p -> desugar std (do {s}) 


Figure 3. Haskell 2010 desugaring of do-notation 


2.5 Optimising the translation 

In Section [T2] we built a pair of results from A and B, used »=, and 
pattern-matched the resulting pair. Here is an alternative and neater 
translation using the join combinator (see Figure [TJ: 

join ((Axl x2 -* (,) <$> C[xl] <*> D [x2] ) 

<$> A 
<*> B) 

By using join we avoid the intermediate pair (xl,x2). One 
should think of join as a more flexible >>=, and in fact in our 
translation we shall be using join instead of >>= in this way. 

3. The new desugaring algorithm 

In this section we formalise our new desugaring algorithm for do 
notation. It proceeds in two stages: 

• Rerrangement (Section [3.2) . The first stage corresponds to our 
informal execution plan. It takes a seqeuence of statements 
si, ... ,s„ and groups them into parallel blocks (Section [3. 1 } , 
thus building a tree. Rearrangement does not re-order the state- 
ments, merely groups them; flattening the tree returns the orig- 
inal statement sequence. 

• Desugaring (Section [33}. The second stage turns this tree of 
statements into an expression using <*>, <$>, >>=, and join. 

Before presenting rearrangement and desugaring in detail, we first 
present an extended syntax for do -notation in Section |3.1| This 
syntax serves as a bridge between the two stages of the algorithm, 
capable of expressing the choices made by rearrangement without 
the noise introduced by desugaring. 

For comparison, the standard Haskell 2010 desugaring for do 
expressions is given in Figure[3](using the Haskell Report’s abstract 
syntax which does not distinguish the final return, unlike ours). 


For simplicity we ignore refutable patterns for now, but we return 
to them in Section [X71 

3.1 Parallel blocks 

In Section [5] we used an informal notation (A | B);C to describe 
our desiredexecution plan, and used that plan to desugar the do 
expression. In this section we formalise that notation as a simple 
and independently-useful extension of do-syntax 

Figure [2] gives the new (abstract) syntax. Note the following 
points: 

• For expressions and patterns we omit everything except the 
forms we use in our translation; hence the “. . .”. 

• In our abstract syntax, an expression do l e represents a do 
expression with statements l that ends in return e or pure e. 
If the original source do expression does not end in return e 
or pure e then we can be transform it so that it does, by 
introducing a dummy variable. For example, do { x «- A ; B } 
would be represented as do{a; <- A; y <- B} y in our abstract 
syntax, where y is a fresh variable. 

• A statement s is either a single statement (bind, guard, or let), 
or it is a parallel block (h \ . . . \ l n ), where each U is again a 
sequence of statements. 

These parallel blocks are not written by the programmer; rather, 
they are introduced by our rearrangement algorithm. Their meaning 
is simple: a block (li \ . . . \ l n ) means the same as the statement 
sequence (1 +f ■ • • 4-F where +F appends two sequences of 
statements. Thus, for example, these two mean the same thing: 

do (a *- A | b *- B) do a «- A 

c <- C b <- B 

(d *- D | e <- E) c 1 - C 

d 1 - D 
e «- E 

In short, flattening all parallel blocks does not change the meaning 
of the program. 

As the syntax suggests, though, a parallel block requires that no 
result computed by U is required by any of the other blocks lj . This 
is enforced by a simple scoping limitation: the variables bound in 
U are not in scope in lj when i ^ j . In the above example, a is not 
in scope in B, nor vice versa. Similarly, a, b. and c are all in scope 
in D but e is not. 

The independence of the U in a parallel block means that the 
desugaring algorithm is free to combine them with Applicative 
combinators. To be concrete, the parallel block (Ii | . . . | l n ) is 
equivalent to the statement 

(pi,...,p„) *-(,...,)<$> do (ipi 
<*> . . . 

<*> do l n p n 

where pi = tuple bv((;) and bv((i) are the variables bound by 
h. By <*> = ap, this interpretation is equivalent to flattening the 
parallel block into a sequence. 

Haskell afficionados should not confuse these parallel blocks 
with GHC’s existing parallel list comprehensions, enabled by the 
ParallelListComp language extension. The two have the same 
scoping rules, but different semantics. There is no conflict between 
them, because parallel list comprehensions are a source-language 
feature, whereas the parallel blocks of this paper are entirely in- 
ternal. So we simply ignore parallel list comprehensions for the 
purposes of this paper. 

3.2 Rearrangement 

The algorithm for rearrangement is given in Figure[4] The function 
rearrange applies to the sequence of statements l in a do expression 
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rearrange{si; . . . ; s„} = {si}, ifn = l 

= split gi, if k = 1 

= {(split (71 | ... | split gk)}, otherwise 

where 

gi ... gk = segments {si; . . . ;s„} 

segments {si; . . . ; s„} = {si; . . . ; } - - - {s(i fc )+i; 

where 

ii ... ik = {i € 1 ... n 

|bv{si; . . . ;si} n fv{s i+ i; . . . ; s n } = 0} 

split{si; . . . ; s„} = {si}, ifn = l 

= splitat i op t, otherwise 

where 

splitat i = rearrange^; . . . ; s<} -H- rearrange{si+i; . . . ; s n } 
iopt G 1 . . . n such that 

Vj ' . 1 < j < n. cost s (splitat j) > costs (splitat i op t ) 
costs {si; . . . ; s„} = £{cost a Si | 1 < i < n} 
cost a (p <- e) =1 

costa (Zi | • ■ • | In) = max{costs h | 1 < i < n} 

fv {si; . . . ; s n } = the free variables of si ■ ■ ■ s n 
bv {si; . . . ; s n } = the variables bound by s\ . . . s n 


Figure 4. Rearrangement: introduce parallel statements 

which contains no parallel forms, and it returns a new sequence 
in which the parallel form is used “as much as possible” (we will 
formalise this in Section[4j. Let us consider this example: 

do 

xl <- A 
x2 <- B [xl] 
x3 <- C 

return (x2,x3) 

Rearrangement ignores the final expression, return (x2,x3) in 
this case, and considers only the list of statements. The first step is 
to split the list into segments, as defined by the segments func- 
tion in Figure [4] We define segments according to where their 
boundaries are: there is a segment boundary after statement i in 
the sequence whenever none of the variables defined by statements 
Si ... Si are used in the following statements, Si+i . . . s n . Intu- 
itively, we are looking for the places in the sequence that have no 
dependencies crossing them, which are exactly the places we can 
split the sequence to use the applicative <*> operator. 

The dependencies of our expression look like this: 

A ^ B C 

A segment boundary is a place in the sequence that has no arrows 
crossing it. In our case there is only one such place: between the 
statements B and C. From the definition of rearrange this gives 

(split {xl <- A; x2 <- B [xl] } | split {x3 <- C}) 

Next, split deals with a single segment. By the definition of 
segments we cannot split this segment into independent sub- 
segments, so we have no alternative but to divide it into two sub- 
sequences and combine them with The question is, at which 
point should we divide the sequence? There is no way to tell locally 
which is the best spot to split it, so we exhaustively test the possibil- 
ities and pick the best (or one of the best, since there might be more 
than one). Alternatives are evaluated using a simple cost function, 


which assumes a parallel execution model in which each statement 
has unit cost. Note that there is a more efficient implementation of 
this algorithm that we discuss in Section [43] 

In our example, there is only one choice for the split boundary 
in the left segment, and the right segment has a single statement 
so is returned by split unchanged. The two recursive calls to rear- 
range are both on single statements, which returns the statement 
unchanged, leaving the final result: 

({xl <- A; x2 <- B [xl] } | {x3 <- C}) 

In Section [T4| we will consider a more complex example where 
the search for an optimal split point in split comes into play. 

3.3 Desugaring 

The next stage is desugaring, where we turn our tree of state- 
ments into a concrete expression, using the operators from the 
Applicative and Monad classes. 

Figure [5] gives the desugaring for a rearranged do expression. 
For an expression do 1 e, the call (desugar Z e), produces an 
equivalent expression that does not use do at the outer level. In 
the call (desugar l e) we will call e the continuation ; it is the 
expression that forms the return value after l has performed its 
effects and bound any variables mentioned in e. 

There are five cases in desugar: the empty case, two cases for 
bind, and two cases that deal with parallel statements. The two 
cases that explicitly match on a singleton statement, (1) and (3), 
are required for building expressions that require only Functor 
or Applicative respectively. Without these two rules, desugar 
would still produce a valid result, but it would require a Monad 
constraint in some cases where one is unnecessary. 

Our running example will help to illustrate the process of desug- 
aring. Applying desugar to the expression after rearrangement: 

desugar{{xl «- A; x2 <- B[xl]} | {x3 «- C}} (x2,x3) 
requires rule (3), yielding the applicative expression 

(Ax2 x3 -* (x2,x3)) 

<$> desugar {xl *- A; x2 <- B [xl] } x2 
<*> desugar {x3 <- C} x3 

Each element of the parallel composition Z, becomes an argument 
of the applicative expression. For each h, the function desugar ars 
returns a pair of (a) the pattern to use in the lambda, and (b) the 
expression to use in the argument position. For the pattern, we form 
a tuple of the variables that are both defined by Z; and used in the 
continuation. In our example, the first argument defines both xl and 
x2, but of these only x2 is used in the continuation (x2,x3), so 
the pattern becomes x2 (a tuple of one term is the term itself). The 
second argument defines only x3, so that becomes the pattern. The 
function on the left of <$> is a lambda expression with patterns for 
each of the arguments (here x2 and x3), and the body of the lambda 
is the continuation. 

In the arguments of the applicative expression we now have 
recursive calls to desugar, so let’s consider the first of those: 

desugar {xl *- A; x2 <- B [xl] } x2 

This requires case (2), yielding 

A »= Axl -> desugar {x2 <- B [xl] } x2 

Next, the inner desugar call hits rule (1), specifically the first case 
since x2 = x2, yielding just B[xl], This special case of rule (1) 
avoids leaving an unnecessary call to <$> in the output, which 
might be difficult to optimise away later. 

The other recursive desugar call reduces in a similar way, lead- 
ing to this overall result: 


5 


2016/3/18 


desugar :: Stmts — > Expr — > Expr 

desugar {} e = pure e 

desugar {p «- e} e 7 
| p == e! = e 
| otherwise = (A p-*e') <$> e 

desugar {p *- e; 1} e' = e »= A p -> desugar l e' 

desugar { (Zi | . . . | („)} e 

= (Apr . . . p„-> e ) <$> ei <*> . . . <*> e n 
where (p;, ef) = desugar ors U fv(e) 

desugar { (Zi | . . . | Z„); s} e 

= join ((Api . . ,p„-* e') <$> ei <*> . . . <*> e„) 

where 

e ; = desugar s e 
( Pi,ei ) = desugar ars k fv(e') 

desugar flls :: Stmts -A Set Var -A (Pat, Expr) 
desugar flls {p <- e} us = (p, e) 

desugar a!S l vs = ((vi, v k ), desugar l (vi,...,v k )) 

where 

vi . . .v k = bv(Z) fl vs 


(0) An empty list of statements 

(1) A singleton bind; use <$> 

(2) The general case for bind: use »= 

(3) A singleton parallel block: 
build an applicative expression 

(4) A parallel block that is not the last statement: 
build an applicative expression with join 


Figure 5. Desugaring 


(Ax2 x3 -* (x2,x3)) 

<$> (A »= Axl -> B [xl] ) 

<*> C 

which is exactly what we wanted. 

3.4 A larger example 

Here is a more complicated example: 

do xl r A 

x2 <- B [xl] 
x3 <- C 
x4 <- D[x3] 
x5 *- E[xl,x4] 
return (x2,x4,x5) 

The statements have this dependency structure: 



As before, we apply segments first. This time there are no 
segment boundaries, because the dependency from E to A spans the 
whole sequence. Thus we have a single segment, and we proceed 
to split. 

In split, we must try all possibilities for a split and determine 
their costs. The four possibilities are enumerated below. For con- 
ciseness in the following discussion we will refer to the statements 
by their right hand sides (A, B, etc.): 

1. Split after A, giving A; rearrange{B; C; D: E}. There are seg- 
ments B and {C; D; E}, giving the final result A; (B | {C; D; E}) 
(cost 4). 

2. Split afterB, giving rearrange{A; B}; rearrange{C; D; E}, which 
reduces to A; B; C; D; E (cost 5). 


3. Split after C, giving rearrange{A; B; C}; rearrange{D; E}. There 
is a segment boundary on the left after B, and we end up with 
({A; B}|C); D; E (cost 4). 

4. Split after D, giving rearrange{A; B; C; D}; E. There is a segment 
boundary after B, and the final result is ({A; B} | {C; D}); E (cost 

3). 

The rearrangement with the minimum cost was to split between 
D and E, which allowed us to put the two subsequences A ; B and 
C ; D in parallel with each other. 

The full result of rearrangement is 

({xl <- A:x2 <- B [xl] } | {x3 <- C;x4 «- D[x3]}); 

{x5 <- E [xl ,x4] } 

Applying desugar results in: 

join (A(xl,x2) x4 -» 

E[xl,x4] >>= Ax5 -* pure (x2,x4,x5)) 

<$> (A »= Axl -* B[xl] >>= Ax2 -* return (xl,x2)) 
<$> (C »= Ax3 -* D [x3] ) 

Note that we determined that only x4 needed to be returned from 
the sequence {x3 *- C; x4 <- D[x3]}, because desugar a)s takes the 
intersection of the variables defined by the sequence (x3 and x4 in 
this case) with the variables used in the continuation (x2, x4, and 
x5), which here is the singleton set containing x4. 

3.5 do expressions that require Functor only 

A pleasant consequence of rule (1) in Figure [5] is that a simple do 
expression such as 

do 

x <- ask 

return (filter (==x) list) 
desugars to 
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(Ax -* filter (==x) list) <$> ask 

This is a degenerate case of constructing an applicative expression, 
where we have only a single argument. With two or more indepen- 
dent statements the expression would use <*> and hence require 
an Applicative constraint, but here since we only use <$> the 
expression requires only Functor. 

Interestingly, this translation may be more efficient than the 
standard Haskell desugaring, because <$> often has a more direct 
implementation than the combination of »= and return. For 
example, consider the Functor and Monad definitions for lists: 

instance Functor [] where 
fmap = map 

instance Monad [] where 
return x = [x] 

xs »= f = [y | x <- xs , y <- f x] 

So m >>= return . f involves creating an intermediate single- 
ton list [x] which is immediately deconstructed by >>=, whereas 
fmap f m does not have this intermediate singleton. 

Note that by virtue of rules (1) and (3), every do-notation ex- 
pression that ends with return or pure will be translated using 
<$>, effectively doing a little optimisation during desugaring, and 
leaving the optimiser with a little less work to do later. 

3.6 return and pure 

Our algorithm treats pure and return identically when they ap- 
pear as the last statement of a do (see Section ED- and generates 
code that uses only pure. The latter is necessary so that we can 
generate code that requires only Applicative rather than Monad. 

This might be surprising, because do { return E } turns into 
pure E. However, return is arguably a historical legacy, born 
before the discovery of applicative functors, and nowadays we 
should really be using pure. Indeed, there are those who argue 
that return should be removed from the Monad class and given 
the static definition return = pure. 

3.7 Refutable patterns 

A refutable pattern is one which may fail to match at runtime. 
Variables and single-constructor patterns (such as tuples) are ir- 
refutable, because they cannot fail to match; patterns that refer to 
one constructor of a sum type (such as x : xs) are refutable. 

The desugaring translation in Figure [5] needs an extra rule to 
handle refutable patterns: 

desugar(do {p *- e; 1} e') (0.5) Handle refiitable patterns 

refutable p = 
let ok p = desugar l e! 
ok _ = fail " . . . " 

in 

e >>= ok 

This rule takes precedence over Rules (1 ) and (2) when the pattern p 
is refutable. In particular this means that we cannot use <$> when 
the pattern is refutable, so a refutable pattern will entail a Monad 
constraint. Furthermore, future changes to Haskell are expected 
to remove fail from the Monad class into a separate MonadFail 
class, so this rule will result in a MonadFail constraint. 

There is one more modification we need. The first clause of 
desugar flls only applies when p is irrefutable: 

desugar flls {p <- e} vs | not(refutable p) = (p, e) 

and we fall back to the second clause, which will use Rule (0.5) 
above. 


3.8 Extension to other statement forms 

Haskell's do notation has two additional statement forms that we 
have not dealt with yet: let statements and guards (Figure[2]>. 

The let form is dealt with straightforwardly. First, the cost 
function treats a let as having zero cost: 

cost a (let decls) = 0 

A let should have zero cost because it can only do pure compu- 
tation, and the goal of our translation is to achieve the maximal 
parallelisation of effects. Second, we must add a case to desugar: 

desugar {let decls ; 1} e = (5) Handle let 

let decls in desugar l e' 

In our implementation we add one small refinement. We observe 
that there is no benefit in having let bindings placed in parallel 
with other statements, so in the result of segments if we have any 
segments that consist only of let bindings, we concatenate those 
bindings onto an adjacent segment. This results in slightly shorter 
desugared code with no loss in parallelism. 

The guard form can be dealt with in two ways. The easiest way 
is to translate it into a bind statement with a wildcard pattern: *- e. 

That works, and yields the optimal parallelism, but it is possible to 
achieve better efficiency in some cases (see Section [8~2| i. 

3.9 Pitfalls 

We encountered two related pitfalls when applying this translation 
to real code. In Haskell today it is possible to define fmap using do 
syntax, like this: 

instance Functor T where 

fmap f m = do x <- m; return (f x) 

If we apply our applicative desugaring this becomes 

instance Functor T where 
fmap f m = f <$> m 

and since <$> = fmap, the definition is now a loop. The fix is to 
define fmap without using do, as fmap f m = m »= return . f . 
A similar problem arises with Applicative instances: 

instance Applicative T where 

mf <*> mx = do f <- mf ; x <- mx; return (f x) 

which turns into a self-recursive definition of <*>. The solution is 
to use <*> = ap (and ensure that the definition of ap itself does 
not fall into this trap!). 

4. Optimality of split 

The optimality of our algorithm is relative to the cost function, 
which assumes a parallel execution model in which every statement 
has identical cost. Statements combined with “ I ” are assumed to 
run in parallel and thus we take the maximum of their costs, while 
statements combined with run serially and so we add their 
costs. This corresponds closely to the execution model of the Haxl 
monad, and it is sufficient to give good results for other monads 
because it favours <*> over »=. As we saw earlier (Section |2.1fr , 
we can sometimes do better if we have more knowledge about 
the exact cost of statements, but in general that knowledge is not 
available. 

4.1 Optimality of outer parallelism 

Our rearrangement algorithm exploits outer parallelism first, using 
the function segments in Figure [4] It is not immediately clear that 
this gives optimal results, so in this Section we formalise that claim. 
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Consider a sequence of n statements si ... s n . Let Cij stand 
for the optimal cost of rearrangement of a subsequence Si ... Sj. 

We make no assumption about the relative costs of individual 
statements, hence we let Cu = 1. Furthermore, Cij ft 1 for all 
non-empty subsequences 1 ft i ft j ft n. 

LEMMA 4.1 (Monotonicity). Expanding a subsequence by one 
statement to the left or to the right cannot reduce the optimal cost, 
and can increase it by at most 1: 

Cij ft G (i — 1 ) j ft Cij 4“ 1 

Cij ft C j (j j- 1 ) ft Cij + 1 

Proof. The upper bound is achieved by sequentially composing 
the new statement with the original subsequence: Si-i ; (si . . . Sj) 
or (si . . . Sj) ; Sj+i. The lower bound can be proved by induction 
on the length of subsequences. The base case 1 < Ci^ + i) trivially 
follows from Cij ft 1 (the case where we expand to the left follows 
by symmetry). For the inductive step we examine two cases: 

1. The optimum in C;(j_ |_i) is achieved by sequential composition 
(si . . . Sk) ; (sfc+i • ■ • Sj+i) for some i ft k ft j. Then, 

Ci U+1) =Cik+C(k- nXj+i) — ^ ^ Cik+C(k+i)j ^ Cij, 

where (*) follows from C(k+i)j ft C(fc+i)(j+i) (the inductive 
hypothesis), and (**) is due to the optimality of Cij. 

2. The optimum in Cqj + i) is achieved by parallel composition 
(si . . . Sk) I (sfc+i • ■ • Sj+i) for some i ft k ft j. Then, 

C-i(jj-i) — max(C f ifc, ft max(Ci/c , (-^(fc+i)j ) ^2 Cij, 

where the inequalities hold for the same reasons as in case 1. ■ 

THEOREM 4.2 (Parallelism is optimal). If a subsequence Si . . . Sj 
can be split into two segments Si . . . Sk and s^+i ■ ■ ■ Sj with no 
dependencies between them then Cij = max(C;fc, C(k+i)j). ond 
the optimum is achieved by combining the segments using parallel 
composition (si . . . Sk) I (sfc+i • ■ • Sj). 

Proof. Thanks to the Monotonicity Lemma |4~T| one can see that 
Cij ft Cik and Cij ft C^+i)j, which can be combined into the 
following lower bound: Cij ft ma,x(Cik,C(k+i)j). The parallel 
composition achieves the lower bound and is therefore optimal. ■ 

4.2 Optimal sequential split 

When a subsequence Si . . . Sj has no outer parallelism, we have to 
use a sequential split (st . . . Sk) ; (sfc+i ■ • • Sj) instead. One can 
find the optimum k in linear time by examining all j — i splits: 

Cij — min {Cik + 

Since there are at most 0(n 2 ) different subsequences Si . . . Sj, the 
overall worst case complexity of the algorithm is 0(n 3 ). Fortu- 
nately, it is often possible to avoid iterating through all values of k, 
hence significantly improving the average case complexity. 

Consider two splits Si ; (si+i . • • Sj) and (st . . . Sj- 1 ) ; Sj. 
Their costs are L = C^ + i)j + 1 and R = + 1, respectively. 

THEOREM 4.3 (Sequential split). IfL R then Cij = min(L, R) 
and the optimum is achieved by the split with the lower cost. 

Proof. From the Monotonicity Lemma |4~T| we have: 

L - 1 ft Cij ft L A R - 1 < Cij ft R 

By combining the lower bounds we get Cij ft max(L, R) — l. We 
also know that min(L, R) + 1 ft ma x(L, R) since L yl R. Flence: 

Cij ft ma x(L, R) — 1 > (min(L, R) + 1) — 1 = min(L, R). 


Since min(L, R) achieves the lower bound it must be optimal. To 
construct a solution with such cost we choose one of the two ex- 
treme splits, namely Si ; (si+i . . . Sj) or (si . . . Sj-i) ; Sj. ■ 

Theorem |4.3| reduces the complexity of the se quen tial split to 
0(1) when L ^ R. See the example in Section [374] where this 
optimisation could have been used to avoid checking all 4 possible 
splits. The theorem is not applicable in the L = R case, but we 
conjecture that this case can also be solved in 0(1) amortized time. 
We leave this for future work. 

4.3 Optimising rearrangement 

The rearrangement algorithm in Figure[4]considers every partition- 
ing of every segment, which means a naive implementation would 
require time exponential in the length of the statement sequence. 
However, since subsequences are examined multiple times, we can 
apply dynamic programming. Caching the result for a subsequence 
makes the algorithm as a whole 0(?r 3 ): we have O(n) start points, 
O(n) end points, and processing each subsequence is 0(n). 

This algorithm is almost identical to the Cocke YoungerKasami 
(CYK) parsing algorithm, which finds all the parses for a string 
of length n for a context-free grammar. It works bottom-up, by 
considering sequences of unit length, then sequences of length 2, 
and so on. 

In our case, rather than finding all parses for each subsequence, 
we are only interested in the optimal parse (this does not affect 
the time complexity, only space). Furthermore, in practice, find- 
ing the top-level parallelism using segments tends to prune the 
search space considerably, and many subsequences need not be 
considered. Thus, rather than populating the matrix of possibili- 
ties bottom-up as in CYK parsing, it is better to use a lazy cache in 
which values for each subsequence are calculated if necessary and 
then cached. This is easily implemented in Haskell as a lazy array 
or map. 

5. Implementation and Results 

Our implementation of applicative do -notation is included in GHC 
8.0.1. as the ApplicativeDo language extension. Language exten- 
sions are enabled explicitly in GHC, either by a declaration in the 
source file, or by a command-line option to the compiler. 

5.1 Implementation architecture 

The implementation follows a slightly different pattern than the 
presentation in Section [3] although the overall result is the same. 
There are two competing concerns in the implementation: 

• We want to perform our transformation before type infer- 
ence, because it affects inferred types. Some do expressions 
require only Functor (see Section |3.5) , some require only 
Applicative, and the rest require Monad. 

Furthermore, it is useful to do desugaring during the name- 
resolution phase ( renaming ) that comes before type inference, 
because information about free variables (which is required by 
both rearrange and desugar) is readily to hand during this stage. 

• On the other hand, if there is a type error in the code, we want 
to present type errors to the user in terms of the original source 
code and not the rearranged code that our algorithm produces. 
For this reason we can’t just apply rearrange and desugar 
before typechecking, because the shape of the original code 
is lost. GHC performs desugaring for all the other syntactic 
constructs after type inference for this very reason. 

The solution we use is for rearrangement to annotate the ab- 
stract syntax tree with enough information that the type checker 
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can infer the correct type, and so that the later desugaring phase can 
produce the correct applicative expressions. We have to be careful 
to strip the annotations when reporting code fragments back in the 
form of errors or warnings. 

Type inference needs to infer the types of the operators used by 
do notation desugaring: <$>, <*>, and »=, because GHC supports 
a language extension called RebindableSyntax, in which oper- 
ators needed during desugaring refer to whatever operators with 
these names are in scope, rather than the specific instances of these 
operators from the standard library. Even though the typechecker 
is inferring the types of these operators, it must be careful that any 
type errors in the code do not appear when inferring the types of 
these operators, and instead are reported against constructs in the 
original source code. This is rather delicate, but possible if careful 
attention is paid to the the order of unification when type-checking 
do expressions. 

5.2 Optimality vs. compile time 

The optimal algorithm we described earlier has complexity 0(n 3 ), 
which can have a severe impact on compile time for larger do 
expressions (we will give some figures shortly). Out of a desire for 
more predictable compile times, we also implemented a heuristic 
version of our algorithm that improves the complexity to 0(n * 2 ) at 
the expense of optimality in some cases. 

The heuristic version of the algorithm abandons the exhaustive 
search in split in favour of a local decision: we split the sequence 
after the longest initial subsequence of mutually-independent state- 
ments. Since we never examine a subsequence multiple times, this 
also avoids the need to use dynamic programming. 

This policy was arrived at after considering examples that arose 
in the wild, and tends to do well: we achieve the optimal result in 
about 98% of cases (measurements will be presented in more detail 
in the next section). The heuristic algorithm is currently the default 
in GHC, while the optimal one is available as an option. 

One could imagine alternative heuristics that might produce 
better results. For example, we could use the optimal split for short 
sequences but a local decision for larger ones. We leave for future 
work a more thorough investigation of alternatives here. 

5.3 Results: how often does ApplicativeDo apply? 

We tested ApplicativeDo on two large codebases: 

• 118f|^| Haskell packages from LTS Stackage 3.i0 In total the 
Haskell code in these packages contained 38,850 do expres- 
sions, of which 16,293 (41.9%) included at least one use of 
<*> when translated by ApplicativeDo. Furthermore, 10,899 
(28.0%) were fully desugared into Applicative and Functor 
combinators and thus would not require a Monad constraint. 

The optimal algorithm found a better rearrangement than the 
heuristic algorithm in 226 cases, which is 0.6% of all do expres- 
sions, and 1.4% of those where ApplicativeDo introduced 
<*>. 

• The Haxl codebase at Facebook. In here there were 28,273 do 
expressions, ApplicativeDo used at least one <*> in 5,498 
cases (19.4%), and 7,600 (26.9%) were fully desugared into 
Applicative and/or Functor. 

The optimal algorithm found a better rearrangement in 141 
cases, which is 0.4% of all do expressions, and 2.6% of those 
where ApplicativeDo introduced <*>. 


4 About 160 packages failed to compile, mostly due to missing C library 
dependencies on the host platform. 

5 https : //www. stackage . org/lts-3 . 2 
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Figure 6. Frequency of do expression costs in Stackage packages, 
before and after ApplicativeDo 


Figure [6] is a histogram with our cost measure on the x axis 
and the number of do expressions with that cost on the logarith- 
mic y axis, for the Stackage codebase. There are two data sets: 
first without applying ApplicativeDo (the dotted lines, where the 
heavier dotted line is a moving average of width 4) and after ap- 
plying the heuristic ApplicativeDo (the solid red line). Without 
ApplicativeDo, the cost is equal to the number of statements in 
the sequence. We truncated the x axis at 50 to give a better view of 
the more common sizes to the left; in fact there were a few extreme 
outliers with costs over 300. 

Without ApplicativeDo, the median cost is 2 and the 99th 
percentile is 30, and after ApplitiveDo the median cost is also 

2, although the 99th percentile is 6. It is clear from these results 
that ApplicativeDo finds plenty of opportunity for parallelism in 
the do expressions that occur in typical Haskell code. 

There is a strangely regular pattern of spikes in the pre- 
ApplicativeDo data. We investigated this, and it turned out to 
be due to derived instances of the Read class in automatically- 
generated code in the amazonka family of packages. Derived Read 
generates do expressions for parsing, and these packages contain a 
lot of data types with similar shapes. 

5.4 Compile-time overhead 

Worst case. We measured the compile time for a single file con- 
taining a do expression with 1000 statements in which each state- 
ment depends on the previous one, so that there are no segments. 
Compiling this file without optimisation: 


without ApplicativeDo 
with ApplicativeDo (heuristic) 
with ApplicativeDo (optimal) 


1.22s 

1 ,46s (20% slower) 
55.5s (4549% slower) 


Note that in all cases the code being generated is the same, 
because there are no opportunities for ApplicativeDo to in- 
troduce the <*> operator, so the overhead is due purely to the 
ApplicativeDo algorithm itself. 

This is only one data point, and we can make both versions 
of ApplicativeDo arbitrarily slow by using a large enough do 
expression. But 1000 statements is extremely rare (in LTS Stackage 
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3.2 the largest was 302), so our heuristic algorithm will not have a 
noticeable effect on compile-time. However, the optimal version of 
the algorithm can have a signficant effect on compile time — at 300 
statements it imposes a 400% overhead — which is why we left it as 
an option. 


Average case. We measured the compile-time overhead of both 
variants of ApplicativeDo for our Haxl codebase. We measure 
unoptimised compile-time, so as not to dilute the compile-time 
with the extra cost of optimisation. These measurements were the 
average of three complete compiles, and we give error bounds to 2 
standard deviations: 


without ApplicativeDo 
with ApplicativeDo (heuristic) 
with ApplicativeDo (optimal) 


450s +/- 2s 
449s +/- 2s 
449s +/- 2s 


There was essentially no measurable difference between the 
three modes. Neither the heuristic nor the optimal ApplicativeDo 
algorithms have any measurable impact on the compile time for this 
codebase. 

We did not measure compile time for the Stackage codebase, 
because the build system performs a lot of activities that are not 
compiling Haskell files (configuration, installing packages, and so 
forth), so it was not possible to get a meaningful measurement. 


6. Applications of applicative do 

Haskell's do-notation does not add new expressive power to the lan- 
guage; it is just syntactic sugar. But it is powerful syntactic sugar, 
and in practice do -notation is ubiquitous in Haskell programs. By 
extending do-notation to applicative functors we make two main 
gains. First, we can use do-notation for types that are Applicative 
but not Monad. Second, even where the type is a Monad there may 
be compelling efficiency reasons for wanting to use Applicative 
combinators wherever possible. Our main example, Haxl, gains 
parallelism thereby, but there are monads where the program is 
asympotically more efficient if you use Applicaitive combina- 
tors. In this section we review examples of both these gains. 

6.1 Parsing command-line options 

The optparse-applicative package is a library for parsing 
command-line options. It provides an Applicative (but not 
Monad) abstraction which serves two purposes: it builds the data 
structure representing the options, while at the same time specify- 
ing how to parse them. Here is how it looks without AppliativeDo: 

data Options = Options 
{ input : : FilePath 
, verbose : : Bool } 


5.5 Performance improvement 

Sigma is a general detection system at Facebook. Amongst other 
things, it classifies actions on Facebookto detect spam and other 
kinds of abuse. Sigma handles over one million requests per second 
using many machines across Facebook’s different data centers. 

Classification is performed by a set of rules, which are imple- 
mented in Haskell using the Haxl framework and a set of libraries 
developed for interacting with other back-end services. The rule 
code uses do notation, and the ApplicativeDo transformation en- 
sures that this code exploits the Applicative operators that allow 
data-fetching requests to be batched and overlapped with the Haxl 
monad. 

It is difficult to get an accurate measure of the benefit obtained 
from ApplicativeDo, because there are a huge number of vari- 
ables. The effect we want to measure is the difference in concur- 
rency when accessing external systems, which is inherently unpre- 
dictable: those other systems have their own varying performance 
characteristics due to caching and load differences. Moreover, the 
underlying data may change, so requests cannot be reliably re- 
played. 

With these variables in mind, we measured Sigma performance 
as follows. We measured three common request types indepen- 
dently (Sigma handles hundreds of different requests), to eliminate 
differences in workload mix. For each request type, we took a sam- 
ple of recent production requests, and measured the average latency 
of these requests with and without ApplicativeDo. We ran Sigma 
in single-threaded mode — normally Sigma runs with many threads 
processing requests in parallel, but for our purposes that would in- 
troduce more variables and obscure the latency difference we are 
trying to measure. Each separate test had to use a brand new sam- 
ple of traffic, to mitigate the effects of external caching. We used 
a large enough traffic sample that the run lasted several minutes in 
each case, to mitigate the effects of differences in the samples. 

• In request type 1 (typical latency around 150ms) there was a 
44% improvement in average latency with ApplicativeDo. 

• In request type 2 (typical latency around 125ms), there was a 
34% improvement in average latency with ApplicativeDo. 

• In request type 3 (typical latency around 12ms), there was a 
22% improvement in average latency with ApplicativeDo. 


options : : Parser Options 
options = 

(Ainput verbose -* Options { input=input_ 

, verbose=verbose_}) 

<$> strOption ( long "input" 

<> help "Input file" ) 

<*> switch ( long "verbose" 

<> help "Whether to be verbose" ) 

Here, options specifies a parser for two options, — input and 
— verbose, and a data structure. Options, to hold their values. 

The problem is that we want to define the Options type us- 
ing record syntax because it’s more extensible, but using record 
syntax in the parser is cumbersome. We have to match the order 
of the arguments in the applicative expression with the order of the 
lambda-bound variables, which can become error prone when there 
are many options. For this reason people often abandon record syn- 
tax when building parsers for optparse-applicative, but that 
also sacrifices easy extensibility. 

Using do notation with ApplicativeDo is cleaner and more 
extensible: 

options : : Parser Options 
options = do 

i_val *- strOption ( long "input" 

<> help "Input file" ) 
v_val «- switch ( long "verbose" 

<> help "Whether to be verbose" ) 

return Options 
{ input = i_val 
, verbose = v_val } 

6.2 The Seq data type 

In the containers package, Seq provides a length-annotated 
finger-tree m as a general-purpose catenable sequence type. The 
>>= operation for Seq behaves in the same way as lists: it applies 
the second argument for each element of the sequence returned by 
the first argument, and so has complexity 0(mn). 

The <*> operation, on the other hand, can exploit the fact that 
it will be concatenating many trees of the same size, and by using 
lazy evaluation is able to provide access to a single element of the 
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result in at most 0(m + log n) , even though accessing the whole of 
the result is still 0(mn). For example, with ApplicativeDo, this: 

take 10 $ reverse $ 

do{x<-a;y<-b; return (x+y) } 

is instantaneous, but without ApplicativeDo it requires the full 
O(rrm) where m and n are the lengths of a and b. Of course 
we could write this explicitly using <*>, but the do notation is 
clearer and allows us to use real Monad bind when necessary too. 
Seq also provides an efficient <$>, which our new desugaring takes 
advantage of. 

6.3 Probabilistic programming 

Given a monad for probabilistic programming such as cm, where 
monadic values represent a probabilistic model, <*> computes the 
product of two independent models. Sampling from such a model 
can be done by sampling from each model independently. 

6.4 LL(1) parsing 

Swierstra and Duponcheel flTTl described an non-monadic LL(1) 
parser that is guaranteed to parse a proper LL( 1 j grammar in linear 
time. It does so by tracking a FIRST set for each parser. It is also 
capable of checking if such a parser is really LL(1) or if it contains 
FIRST/FIRST or FIRST/FOLLOW set conflicts. 

The FIRST set for a parser contains the set of terminals that this 
parser is able to accept as the first symbol of a successful parse and 
a flag to indicate whether or not an empty parse will be accepted. 

Such a parser extends to a Monad at the cost of the linear 
time guarantee and ability to check a parser for FIRST/FIRST and 
FIRST/FOLLOW conflicts, while retaining this guarantee for the 
Applicative fragment. 

6.5 Heap of successes parsing 

We can modify Wadler’s ’’List of Successes” parser ODD in two 
ways to allow for more efficient Applicative parsing in the pres- 
ence of heavy non-determinism. 

newtype Parser a = Parser (String -* [(a, String)] 

Borrowing the notion of an update monad from Ahman and 
Uustalu on, instead of giving back the new String, we can give 
back how much of the string we’ve consumed, and between parse 
steps drop this many characters from the String. This costs us 
the ability to ’’push back” input we haven't actually consumed, but 
opens up the next option. 

newtype UpdateParser a = Parser (String -* [(a, Int)] 

Next we can track a heap of successes rather than a list, sorted 
by length. 

newtype HeapParser a = Parser (String -* IntHeap [a]) 

Now, code written using <*> needs only execute the right hand 
parser once per distinct length, rather than once per distinct parse. 
By further augmenting such a structure, we could recover the orig- 
inal parse order. 

6.6 Moore machines 

A possibly-infinite Moore machine with states labeled by b and 
transitions labeled by a can be represented with explicit state as the 
following GADT: 

data Moore a b where 

Moore : : (r -> b) -* (r -* a -+ r) -* r -* Moore a b 
instance Applicative (Moore a) where 


pure a = Moore (const a) const () 

Moore xf bxx xz <*> Moore ya byy yz = Moore 
(A(x, y) -» xf x $ ya y) 

(A(x, y) b -i (bxx x b, byy y b)) 

(xz, yz) 

The implementation of <*> for such a machine takes the product 
of the state spaces and builds a new machine| 6 |Another way to think 
of such a machine is as a strict left fold fl51 . and <*> takes two 
independent folds and melds them in a single pass. 

There even exists a Monad for this type, but it is grossly inef- 
ficient. It can be obtained by showing that Moore a b is naturally 
isomorphic to [a] -> b. To operate it has to record every value the 
machine is fed, and then feed each machine that labels our states 
the entire input seen thus far, just to take a single output from each 
machine. 

With ApplicativeDo, we can work fairly naturally with such 
machines without incurring the horrible overhead of the Monad, 
whenever the passes are independent. 

sum : : Num a => Moore a a 
sum = Moore id (+) 0 

length : : Moore a Int 

length = Moore id (Ax -* x + 1) 0 

mean : : Fractional a => Moore a a 
mean = do 
a *- sum 
b *- length 

return (a / fromlntegral b) 

That said, the Monad cannot be avoided entirely, as some com- 
putations simply require multiple passes over the data, such as com- 
puting robust statistics like median absolute deviation, which re- 
quires a pass to compute the median followed by another depen- 
dent pass to compute the median distance to the median we just 
identified. 

7. Related work 

7.1 Extracting parallelism 

Extracting parallelism automatically from programs is a much stud- 
ied problem. Two approaches dominate: extracting implicit paral- 
lelism from a program written in a largely-unmodified host lan- 
guage, or expressing parallelism explicitly is a domain-specific lan- 
guage or library, such as map/reduce El, LINQ 0 or Accelerate 

m 

Applicative-do notation combines features of both: 

• The possibility of parallelism is signalled explicitly by the use 
of do notation, but 

• A lexical dependency analysis is used to figure our exactly 
which statements can be run in parallel. 

Moreover, in Haxl there is a runtime componenent too: a computa- 
tion is only run in parallel if it initiates a remote data fetch. 

7.2 Idiom brackets 

Idiom brackets ED provide a concise syntax for writing applicative 
expressions, where [/ ei . . . e„[ is equivalent to 
/ <$> ei <*>... <*> e n 

The special syntactic form is used heavily in Idris 0 and is also 
implemented in the She Haskell preprocessor 02 . With ingenious 


6 To avoid leaking memory a product type strict in both arguments really 
should be used instead of (,). 
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use of overloading a similar syntax can be implemented in Haskell 
itself and idiom brackets have also been implemented via GHC’s 
quasi-quotation extension 0. 

Compared with applicative do-notation, idiom bracket syntax 
only provides an abbreviation for applicative expressions. It doesn’t 
allow for a mixture of applicative and monad operations, nor does 
it provide the flexibility of the do syntax when used with a pure 
applicative, as we described in Section[6] 

The F# language has an experimental extension (implemented 
in a research branch), that supports a parallel (applicative ) binding 
form in F#’s computation expressions (the equivalent of do nota- 
tion). The use of applicative binding is fully explicit, rather than 
implicit as in our case. 

7.3 Monad comprehensions 

Monad comprehensions f5j also includes a “|” operator in its syntax 
for statement sequences. In Monad Comprehensions, the “|” oper- 
ator desugars into a call to mzip from the MonadZip class, which 
for lists is equal to zip, while for other monads such as Maybe it 
is equal to lif tM2 ( , ) . For monads where mzip = lif tM2 ( , ) , 
the “|” syntax of Monad Comprehensions can be used to write 
applicative expressions, since liftM2 = liftA2. Therefore, for 
some monads. Monad Comprehensions provides an explicit way to 
combine computations applicatively within a monad comprehen- 
sion. However, this is somewhat accidental, since the intention of 
the “|” operator in Monad Comprehensions is to support zipping, 
and the MonadZip class was introduced as the natural generalisa- 
tion to monads of zipping on lists. 

It is not in general semantics-preserving to flatten the “|” oper- 
ator of a monad comprehension to a sequence, unlike in our syn- 
tax, and thus monad comprehensions cannot automatically intro- 
duce “|” via a rearrange transformation. 

While we have not done so yet, we believe it would be entirely 
possible to apply ApplicativeDo to monad comprehensions, and 
we do not anticipate any complications with doing that. 

8. Future directions 

8.1 Alpha-beta pruning 

It is possible, but unimplemented, to further reduce the search space 
for split by observing that our dynamic programming solution has 
the same structure as a minimax problem allowing us to exploit 
alpha-beta pruning j8), computing an alpha-beta bounded transpo- 
sition table rather than a classic dynamic program. 

MTD(f) d is a particularly applicable pruning technique, 
because the length of a do expression acts as a conservative upper 
bound on our cost function, our result is an integer drawn from 
a very small range, and our transposition table is considerably 
smaller than that of most games to which it has been applied. 

MTD(f) would not improve the worst-case cost of computing an 
optimal solution, but based on limited experimentation, it should 
bring some extreme examples, such as the one in Section |5~4| back 
into line with heuristic compile times. In addition, with MTD(f), we 
could allow a parameterized early cut-off, to smoothly interpolate 
between the heuristic and optimal algorithms. 

8.2 More elaborate desugaring 

The standard Haskell desugaring translates do { x ; y } into x » y 
rather than x »= \_ -* y. The Applicative class provides *> as 
an analogue to » and permits it to be overridden. The desugaring 
described thus far does not take advantage of either this combinator 
or <*, which is also included in the class. In the case of Seq, <*> 
applied to an m and n element sequence takes 0(mn) time to com- 

7 https : //wiki .haskell . org/Idiom_brackets 


pute an m * n element list. However *> takes only 0(m + log n) 
time. 

Similarly, Functor since GHC 6. 12 provides <$ as a potentially 
more efficient version of fmap . const and 

do { e ; return () } 

should produce () <$ e to take advantage of this. Returning to 
the example of Seq, () <$ e can be evaluated in O(logn) time, 
rather than the O(n) that fmap takes. 
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